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Abstract 

Bayesian Model Calibration is used to revisit the problem of scaling factor calibration for 
semi-empirical correction of ab initio calculations. A particular attention is devoted to 
uncertainty evaluation for scaling factors, and to their effect on prediction of observables 
involving scaled properties. We argue that linear models used for calibration of scaling 
factors are generally not statistically valid, in the sense that they are not able to fit 
calibration data within their uncertainty limits. Uncertainty evaluation and uncertainty 
propagation by statistical methods from such invalid models are doomed to failure. To 
relieve this problem, a stochastic function is included in the model to account for model 
inadequacy, according to the Bayesian Model Calibration approach. In this framework, 
we demonstrate that standard calibration summary statistics, as optimal scaling factor 
and root mean square, can be safely used for uncertainty propagation only when large cal- 
ibration sets of precise data are used. For small datasets containing a few dozens of data, 
a more accurate formula is provided which involves scaling factor calibration uncertainty 
For measurement uncertainties larger than model inadequacy, the problem can be reduced 
to a weighted least squares analysis. For intermediate cases, no analytical estimators were 
found, and numerical Bayesian estimation of parameters has to be used. 
Keywords: Bayesian data analysis, Model calibration, Scaling factor, Vibrational fre- 
quency 
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1 Introduction 



The final stage in the development of a model, after calibration on an experimental dataset and 
proper validation, consists in its use for prediction: "If the experimental dataset is sufficiently 
broad, there is a reasonable expectation that the results will be accurate to something like the 
target accuracy". 1 The estimation of uncertainty for the results of computational chemistry is 
indeed of paramount importance, notably to their use in multiscale modeling. 2 ' 3 The concept 
of Virtual Measurement has been introduced by Irikura 4 to take advantage of the standardized 
procedures defined by the Guide to the Expression of Uncertainty in Measurement (also known 
as "the GUM"), 5 and to apply them in the case of quantum chemistry modeling. Random 
uncertainties have been shown to be very small, and the major uncertainty factor are biases 
due to basis-set and/or theory limitations. For quantum chemistry to be predictive, i.e. to 
be able to predict observables with confidence intervals, one has to correct for these biases, 
which commonly requires semi-empirical corrections. Uncertainty attached to these corrections 
have to be considered in the final uncertainty budget, to which they constitute often a major 
contribution. 

Scaling of harmonic vibrational frequencies is an important example of calibration method 
in computational chemistry, where estimation of a vibrational frequency v is obtained by mul- 
tiplying the corresponding harmonic vibrational frequency u, routinely calculated by standard 
computational chemistry codes, by an empirical scaling factor s (Fig. [I]) 



Optimal scaling factors have been computed for extensive sets of theory/basis-set combina- 
tions. 6-9 In the majority of publications about scaling factors, two summary statistics are 
provided for each theory/basis-set combination: the optimal scaling factor and the root mean 
squares. From a reference dataset of experimental {^j}^ and calculated vibrational frequencies 
{u)i}i the optimal scaling factor obtained by the least-squares procedure is 



V = UJ s 



(1) 




(2) 
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and the quality of the correction is estimated by the root mean squares (rms) value 




To our knowledge, these values are not explicitly used for uncertainty propagation, but the rms 
provides an estimate of the residual uncertainty resulting from the scaling correction ("some- 
thing like the target accuracy", 1 or "a surrogate for uncertainty" according to 10 ). 

In a recent article (hereafter referred to as Paper I), Irikura et al. 10 treated the problem 
of uncertainty propagation from scaled frequencies, and, using procedures advocated by the 
GUM, 5 insisted that the scaling factor itself is subject to uncertainty, and proposed that this 
uncertainty is the major contribution to prediction uncertainty On the practical side, prediction 
uncertainty would be proportional to the calculated harmonic frequency; on the factual side, 
these authors declare that scaling factors for vibrational frequencies are accurate to only two 
significant figures, whereas all other studies overstate their precision by reporting them with 
four figures. This approach has been adopted by the National Institute of Standards and 
Technology (NIST) and put into practice in the Computational Chemistry Comparison and 
Benchmark DataBase (CCCBDB), 9 section XIII. C. 2, where all scaling factors are provided 
with uncertainties derived according to the procedure of Paper I. 

In the present paper, we revisit the problem of scaling factor calibration and we recast it in 
the Bayesian Model Calibration framework, reputed for providing consistent uncertainty evalu- 
ation and propagation. 11-13 Section [2] presents the methodological elements used for calibration 
and validation procedures, which are applied to representative vibrational frequency and zero 
point vibrational energy datasets in Section [3l Bayesian calculations used in this study are 
fairly standard, but for the sake of completeness and for readers unfamiliar with statistical 
modeling, details are provided in the Appendix . 
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Figure 1: Correlation plot between calculated harmonic frequencies cjj and measured frequencies 
Vi for a set of vibrations extracted from the CCCBDB for the HF/6-31G* combination of 
theory/basis-set (dots). The full line is the regression line v = su; the dashed line is a visual 
aid to appreciate the bias. 
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2 Methods 



2.1 Statistical model for scaling factor calibration 

Considering a measured frequency u obs , one can assume that it is related to the true value u true 
by 

v obs = v true + e ( 4 ) 

where e ~ N(0, p 2 ) is a normal random variable centered at zero, with variance p 2 , representing 
the measurement uncertainty in the hypothesis of additive white noise. In the following, we 
assume that p is known beforehand. 

Calculated harmonic vibrational frequencies u are affected by random errors, related to nu- 
merical convergence defined by non-zero thresholds and the choice of starting point in geometry 
optimization, and to non-zero thresholds in wave-function optimization. 4 It has been shown 
that the associated uncertainties are negligible when compared to the measurement uncertain- 
ties. 4 In the following, one can thus assume that the harmonic vibrational frequencies, being 
deterministically calculated, have fixed values. 

If one makes the hypothesis of a linear relationship between p true and u, the calibration 
model resulting from this analysis is 



v 



obis 



suj + e (5) 



For a single frequency, there is an optimal scaling parameter s = v ohs jio. As 

v obs ig 

uncertain, 

with standard uncertainty p, the value of s cannot be known exactly and has a standard 
uncertainty u s = p/ui. For a calibration dataset with uniform measurement uncertainty p, it 
can be shown that the optimal value for s is still given by Eq. [2j and its standard uncertainty 



by u s = /)/ySj =1 w,- (c.f. Appendix IA.3I) . Applicability of this formula is subject to one 



condition: the model (Eq. [5]) should be statistically valid. This means that the residuals 
fi/obs _ gk^j should have a normal distribution with variance p 2 . Normality is not always 
verified, but most important, the variance condition is typically violated when precise data 
are used for calibration. The linear model (Eq. [5]) is typically unable to reproduce a given 
set of measured frequencies within their uncertainty range. In such a case, the width of the 
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distribution of residuals is dominated by model misfit, not by measurement uncertainty, and 
the model is invalid. 

An option would be to seach for a better model than the linear scaling, which is beyond 
the purpose of this study. Instead, we consider here that the misfit is not deterministically 
predictable. The solution to preserve the linear scaling model in a statistically valid model, 
is thus to introduce a stochastic variable £ to represent the discrepancy between model and 
observations 

u ° bs = SUJ + £ + e (6) 

This equation is similar to the basic statistical model introduced by Kennedy and O'Hagan for 
Bayesian calibration of model outputs. 11 The discrepancy variable could formally depend on u, 
but we observed that the residuals between model and observations are not notably frequency 
dependent (Fig. [2]), and the same is assumed for £ which is considered null in average with 
variance a 2 

£~N(0,a 2 ) (7) 
The calibration equation depends thus on two unknown parameters s and a. 

2.2 Uncertainty propagation 

The calibration model (Eq. [6]) being linear with regard to uncertain variables s, £ and e, and 
in the hypothesis of normally distributed uncertainties, we can use the standard uncertainty 
propagation rules 5 to estimate the value and uncertainty on predicted vibrational frequencies: 

77 = sou (8) 
u\ ( ov \ l av 



Var(u) = Var(s) + |_j Var(0 + Var(e) (9) 

= oo 2 Var(s) + a 2 + p 2 (10) 

The terms due to covariances have been omitted from this equation as the three variables are 
independent by hypothesis. In order to provide evaluated predictions of vibrational frequencies, 
we need therefore to estimate s, u 2 = Var(s) and a 2 from the calibration dataset. This is done 
in the next section, using Bayesian model calibration. 
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Figure 2: Residuals between calculated harmonic frequencies Ui and measured frequencies for 
a set of vibrations extracted from the CCCBDB for the HF/6-31G* combination of theory/basis- 
set (dots). (A) bottom: residuals as a function of to. In order to suppress the grouping effect 
linked with frequencies, the residuals were also plotted as a function of their order in the 
reference set (A)-top. (B): plot of the cumulative density function (CDF) for the residuals 
against a normal CDF shows that globally there is very little deviation from normality. 
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2.3 Bayesian Model Calibration (BMC) 

2.3.1 General case 

Starting from the statistical calibration model (Eq. [6]), one derives the following expression for 
the posterior probability density function of the parameters, given a set of N measured and 
calculated frequencies (details are reported in Appendix IA.1I) 

pM oc — L- 2 , /2 ex P f-if: ( ^ 2 ;y )2 ) (id 

° n l= i + Pi) v 2 i=i ct + ^ / 

Inferences from this pdf have to be done generally by numerical methods, 12 as will be applied 
later to zero point vibrational energies. For vibrational frequencies, we first derive an adapted, 
simplified, model. 

2.3.2 The case of negligible measurement uncertainties 

In the commonly met situation where the amplitude of £ is much larger than the other sources 
of uncertainty (cr 3> p) , we can consider the approximate measurement model 

v* B = su, i + £ (12) 

for which the posterior pdf can be simplified to 

pM cc exp exp (- ^^ ) (13) 

This expression is amenable to analytical derivation of the estimates for the parameters (see 
Appendix EO): 

• the average value for s is identical to the optimal value provided by least-squares analysis 

s = s; 

• the standard uncertainty on s, u s is related to the rms 7 by the formula 

Ms = 7v /AV((iV-3)£^ 2 ); 

• the estimate of a 2 is related to 7 according to o 2 = 7 2 N/ (N — 3). 
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Using these estimates, we can establish an explicit expression for the standard uncertainty of 
v. 



Confidence intervals can be defined for prediction purpose, e.g. the 95% confidence interval for 
assuming the normality of uncertainty is given by 



In such conditions, it is thus possible to derive directly confidence intervals from the the sum- 
mary calibration statistics s and 7 provided by the reference literature. 6-8 

2.4 The Multiplicative Uncertainty (MU) method 

Irikura et al., 10 after a thorough analysis of the uncertainty sources, established that the major 
contribution to uncertainty propagation would be the uncertainty on the scaling factor s. They 
estimate u s from the weighted variance of s with weights a, = uf. This weighting scheme is 
derived in two steps: (1) they propose that the probability density function (pdf ) for the scaling 
factor is a linear combination of pdf 's for individual scaling factors in the reference set; and (2) 
from the comparison of the expression of the average value derived from this proposition with 
the least-squares solution EqOJ This way, they obtain a standard uncertainty 




(14) 



CIg^{u) = [scu — 1.96 u v , sou + 1.96 u u ] 



(15) 





1/2 



(17) 



which can be related to 7 by u* ~ 7 \J N/ ^2 w f- This uncertainty is different, and generally 



smaller, than the dispersion of s values within the calibration set 




1/2 



(18) 
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The uncertainty on a scaled frequency is then reduced to 

u v ~ uju* s (19) 

hence the name of "Multiplicative Uncertainty" (MU) used hereafter. 

The salient feature of Eq. [19] is that uncertainty should be proportional to the calculated 
harmonic frequency. But, if one observes correlation plots for reference datasets {e.g. Fig. 
ID w > 2000 cm -1 ), this is not the case. When contrasted with the BMC approach, one 
understands that the multiplicative approach "absorbs", at least partially, model inadequacy 
in u*. It is thus implicitly assumed in this approach that the model (Eq. [5]) is in a statistical 
regime, although it is not. These points will be illustrated and discussed in the next section. 

3 Applications and discussion 
3.1 Vibrational frequencies 

We illustrate the BMC and MU approaches on the HF/6-31G* combination of theory/basis-set. 
The reference dataset, plotted in Fig. [U has been downloaded from the NIST/CCCBDB in 
July 2008. 9 

3.1.1 Calibration 

Using the BMC model and estimators on the reference set, we obtain s = 0.8984 ± 0.0005, 
and a = 45.3 ± 0.6 cm -1 (Table [[]), which is consistent with the value of the rms obtained 
by Merrick et al. 8 for the same theory/basis-set combination. For this dataset, the CCCBDB 
proposes s = 0.899 ± 0.025. We cross-checked this value using Eq. [T7] (Table [p. The standard 
uncertainty on s evaluated by both methods differ thus by a factor 50, which is expected to 
have noticeable effect on prediction uncertainty. In order to visualize this effect, we plotted the 
95 percent confidence intervals in both cases (Fig. [3]). It is immediately visible that the the MU 
approach has a tendency to underestimate uncertainty at low frequencies and to overestimate 
it at high frequencies. In comparison, the BMC approach is more balanced. 
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Summary stat. 



MU 



BMC 



s a s 7(cm x ) u s %CI 95 u s a {cm l ) %CI 95 
All frequencies (2738) 

Full set 0.8984 0.069 45.3 0.024 0.0005 45.3 

Calibration set 0.8986 45.3 0.024 - 0.0007 45.3 

Validation set - 83.0 - - 94.6 

Frequencies between 3180 and 3500 cm" 1 (479) 

Full set 0.9050 0.0087 28.7 0.0087 0.0004 28.8 

Calibration set 0.9052 23.3 0.0071 0.0005 23.4 

Validation set 95.0 95.4 



Table 1: Statistical estimates and validation for MU and BMC models for vibrational frequencies extracted from the CCCBDB for the HF/6- 
31G* combination of theory/basis-set. 



Baye si an Model C al ibr ati on MultiplieativeUncertainty 
1 1 1 1 I 1 1 1 




Ji 3000 4000 3000 4000 




Figure 3: Confidence intervals at 95% level calculated with Bayesian Model Calibration (left col- 
umn) and Multiplicative Uncertainty (right column) methods, in two representative frequency 
regions from the calibration dataset for the HF/6-31G* combination of theory /basis- set (dots). 
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3.1.2 Validation 

In order to comfort the observations of the previous section ( 13.1.11) . we split the calibration 
dataset in two subsets by using the odd indexes for calibration, and the even ones for validation 
(the frequencies being ordered per molecule, this provides a quasi-random selection with regard 
to frequency value). In this case, one gets slightly different values of the parameters, as reported 
in Table[U Using these values, we generate 95% confidence intervals and calculate the frequency 
of inclusion of the experimental values of the validation subset within these intervals (Fig. [3]). 
For a consistent estimator, one should find a frequency close to 95%. BMC succeeds for 94.7%, 
whereas the MU model succeeds for only 83% of the frequencies in the validation set (Table [1]). 
Considering the size of tha samples, the difference is significative, and the consistency of the 
multiplicative uncertainty approach in this context can be questioned. 

3.1.3 Test on a restricted frequency scale 

As stated in Paper I, "to apply the fractional bias correction, it is important to select a class 
of frequencies similar to the ones to be estimated". For instance, if one selects in the reference 
set only those frequencies between 3180 and 3500 cm -1 , one gets a much more uniform picture 
than with the full reference set. The calibration procedure was done with this limited set of 479 
frequencies, providing s = 0.9050 ± 0.0087 (Tabled]). In this case, we note that the uncertainty 
factor for s is practically identical to the standard deviation calculated from the sample (0.00869 
vs. 0.00871): u* s ~ a s . Due to the restricted frequency range, one has indeed uif / ~ 1/N 

(cf. EqsOandllH]). 

The restricted set has been split in two using index parity, as before. The scaling factor 
obtained by MU from the calibration subset is now s = 0.9052 ± 0.0071, and 95.0% of the 
validation frequencies fall within the 95% confidence interval. This result is now identical to 
the one obtained with BMC (Table [1]), the confidence intervals obtained by both methods being 
indist inguishable . 

It appears thus that in restrictive conditions, the MU method can be valid for reference 
sets where the individual scaling factors are uniformly distributed with regard to the harmonic 
frequencies. In such case however, the uncertainty is reduced to a conventional unweighted 
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standard deviation. 



3.1.4 Uncertainty propagation 

The relative importance of both factors u 2 s and a 2 (p = in this example) in Eq. [10] can be 
evaluated on the example of a calculated harmonic frequency in the higher range u = 3000 
cm -1 (Table [2]). In this case, the uncertainty on the scaling factor contributes only to one 
thousandth of the total prediction variance. 

For any practical purpose, the uncertainty on the scaling factor of vibrational frequencies can 
therefore be neglected. The uncertainty on a is also much too small to be relevant for confidence 
intervals calculation. One can thus safely apply the uncertainty propagation formula (Eq. [T6|) . 
using the rms provided by most reference articles dealing with scaling factors calibration. 6-8 



3.2 Zero Point Vibrational Energies 

We consider ZPVE additional test because the reference datasets are considerably 

smaller than for the vibrational frequencies (e.g. 39 molecules in the Zl set of Merrick et 
al. 8 ), which is expected to enhance the role of the uncertainty on the scaling factor. In the 
absence of a systematic review of measurement errors for ZPVE of polyatomic molecules, we 
consider in the following that they can be neglected. The uncertainties reported by Irikura 14 
for diatomic molecules are indeed very small (on the order of 0.01 cm" 1 ), but transposition to 
larger molecules is not straightforward. 

Using our measurement model and estimators on the reference set, one gets s = 0.9135 ± 
0.0027 and a = 0.73 ± 0.09 kJ mol" 1 (Table EJ), which is consistent with the rms obtained 
by Merrick et al. 8 for the HF/6-31G* theory/basis-set combination. Relative uncertainties on 
these parameters have been increased by one order of magnitude, compared with the vibrational 
frequencies case. The validation exercise shows once more that the Multiplicative Uncertainty 
model fails to provide correct confidence intervals, with a score of only 0.63 for CI95. 

In such a case of small reference dataset, it is interesting to check if the approximate formula 
(Eq. [TBI) for uncertainty propagation which was validated for large sets of vibrational frequen- 
cies begins to break down, i.e. the contribution of the multiplicative term involving u s stays 

14 



negligible or not, for the larger ZPVE values. For instance, if one considers a calculated ZPVE 
of 100 kJ mol" 1 , one has u v = ^(100 * 0.0027) 2 + 0.73 2 = 0.78 kJ mok 1 , to be compared to 
7 = 0.71 kJ mol" 1 . It is to be noted also that the uncertainty on a might contribute at the same 
level, u a = 0.09 kJ mol -1 . Taking all uncertainty sources into account trough Eq. EH by Monte 
Carlo Uncertainty Propagation, one gets u v = 0.77 kJ mol -1 . Apparently, the fluctuations of 
cr do not have an effect and only the uncertainty on the scaling factor has a noticeable effect, 
albeit quite small. 

In the same conditions, for the combination B3LYP/6-31G*, one gets 7 = 0.423 kJ mol" 1 
and u u = 0.472 kJ mol -1 , to be compared with an exact value of u v = 0.466 kJ mol -1 
(Table [3]). There is globally only a 10% increase compared to the rms 7. In this range of 
ZPVEs, 7 still provides a good approximation of the uncertainty factor (Table EJ). However, 
the amplitude of the discrepancy between 7 and u v will probably increase with the size of 
the molecule. In consequence, for uncertainty propagation with ZPVEs, notably for large 
molecules, it would be safer to use the full UP formula (Eq. IT4"]) . involving the multiplicative 
uncertainty factor. Compilations of scaling factors should thus report the easily calculated 
value of u s = 7 \J N/ ((N — 3) ^2 oaf), in addition to the rms 7. 

3.2.1 Non-negligible experimental uncertainties 

When the measurement uncertainty becomes comparable to the rms, model discrepancy should 
be vanishing, and confidence intervals for prediction should include the measurement uncer- 
tainty, i.e. u 2 u = w 2 u 2 + a 2 + p 2 . In the absence of an exhaustive compilation of experimental 
uncertainties on measured ZPVE, we performed simulations assuming a uniform uncertainty 
distribution over the full dataset. In order to test the sensitivity of the model parameters to 
this uncertainty, we repeated the estimations of previous section, using Eq. [271 f° r different 
values of p between 0.1 and 1.0 kJ mol" 1 . The results are reported in Fig. HI 

As expected from the properties of the posterior pdf, the average/optimal value of the 
scaling factor is insensitive to the amplitude of p. Moreover, we observe only a slight increase of 
u s from 0.002 to 0.004. A transition from a constant u s , defined by the p = limit, to a linear 
increase consistent with the weighted least squares limit (Eq. [501) is observed around p = 7, 
where both limits intersect. A closer look shows that the transition occurs indeed at values of 
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p slightly smaller than 7, in a region (p ~ 0.35) where u s displays a minimum. 

The evolution of a is more dramatic: it displays a sharp decrease and falls down to zero 
as soon as the measurement uncertainty reaches and overpasses the value of the rms 7. For 
values of p below 0.25, a obeys to the a 2 + p 2 = 7 2 law (represented as a dashed line in Fig. HJ), 
but the calculated decrease becomes much faster in the transition zone. The uncertainty on a 
increases notably in the transition region observed for s. 

In the limit of large experimental uncertainties, the uncertainty propagation formula can be 
written as 



In this region of large measurement uncertainty, the inadequacy variable £ becomes useless, as 
the calibration with the scaling factor alone enters a statistical regime. 

4 Conclusion 

A reanalysis of scaling factors calibration in the aim of providing quantified predictions shows 
that for large validation sets of accurate data, as used for vibrational frequencies, faithful 
prediction uncertainties can be derived from the standardly reported optimal scaling factor and 
rms. For much smaller datasets of a few dozens of data, as in the case of ZPVEs, uncertainty 
on the scaling factor should also be reported. The following limit formulas have been validated 
and are proposed for uncertainty propagation from a given calculated harmonic frequency uj: 

• large calibration sets of precise data (p <C 7): u u {uj) = 7; 

• small calibration sets of precise data (p 7): u v {uo) = 7 J (1 + uj 2 / ^\ to 2 ); 

• sets of data with large uniform uncertainty (p > 7): u u (u>) = p yfl + uj 2 / ^ uj 2 . 

A general estimation framework, based on Bayesian Model Calibration, has been defined for 
those cases where the limit conditions defined above are not met. 

The Multiplicative Uncertainty method proposed by Irikura et al. 10 (u u ~ oj'y a/ N/ ^ ^f) 
has been shown to be inconsistent when large frequency ranges are considered. It could not 



2 22,2 
u v = UJ u s + p 



(20) 




(21) 
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U) (HF/6-31G*) 


2 2 
^ Us 


<r 


v ± u v 


Freq. 


3000 


2.25 


2052.09 


2695±45 cm" 1 


ZPVE 


100 kJ moE 1 


0.029 


0.194 


98.12±0.47 kJ mol" 1 



Table 2: Uncertainty propagation with calibrated model for a set of 2738 vibrational frequencies 
extracted from the CCCBDB for the HF/6-31G* combination of theory/basis-set and for a set 
of 39 ZPVE of the Zl set for the B3LYP/6-31G* combination. 
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Figure 4: Evolution of measurement model parameters with the amplitude of an hypothetical 
uniform experimental measurement uncertainty p on ZPVE; B3LYP/6-31G* combination of 
theory/basis-set (green squares with error bars). The brown vertical dashed line indicates the 
value of the rms 7. Top panel: the red dashed lines represent the la confidence interval in 
the limit of null experimental uncertainty; the blue dashed line represent the la confidence 
interval in the weighted least squares limit. Bottom panel: the red dashed line represents the 
a 2 + p 2 = 7 2 law, truncated to positive values of a. 
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Summary stat. 



MU 



BMC 



s a s 7(kJ mol x ) u s %CI 95 u s <r(kJ mol x ) %CI 95 

HF/6-31G* 

Full set 0.9135 0.0607 

Calibration set 0.9078 
Validation set 

B3LYP/6-31G* 

Full set 0.9812 0.0375 0.42 0.0103 0.0017 0.44±0.05 

Calibration set 0.9825 0.45 0.0134 0.0032 0.48±0.08 

Validation set 0.68 1.00 



Table 3: Statistical estimates and validation for MU and BMC models for a set of 39 ZPVEs of the Zl set for the HF/6-31G* and B2LYP/6-31G* 
combinations of theory/basis-set. 



0.71 
0.77 



0.0161 
0.0214 



0.0027 
0.0052 



0.73±0.09 
0.83±0.14 



0.63 



0.95 



be recovered by the Bayesian approach, except when the dataset spans a restricted frequency 
range, in which case it is reduced to a trivial, unweighted, standard deviation. The use of the 
scaling factor uncertainties as reported presently (November 2008) in the CCCBDB 9 cannot 
therefore be recommended. 

I would like to stress out that the validity of the formulas for uncertainty propagation 
depends to some extent on the normality of the residuals z/j — suji of the linear regression. 
Inspection of histograms of residuals (see e.g. Fig.l in Ref. 7 ) shows that this is not always the 
case. The customary approach to choose an optimal theory/basis-set combination is to assess 
their performance by the rms alone, maybe weighted by computational cost considerations. 6-8 
One could also consider a "normality criterion" in order to reject theory/basis-set combinations 
providing non-normal residuals and from which the summary statistics cannot be used faithfully 
for uncertainty propagation. Analysis of restricted ranges of data as presently done by some 
authors for vibrations 8 is one way to improve the normality of residuals. 

Semi-empirical correction of ab-initio results by scaling is presently very common and effi- 
cient for many observables. It certainly would be a large step towards the general applicability 
of uncertainty propagation, if statistically pertinent estimators were systematically reported in 
the literature devoted to the calibration of these correction models. 
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A Appendix 

A.l Bayesian analysis of scaling factor calibration model 

We consider the measurement model 

vf» = au i + Z + e i (22) 
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where 6j ~ N(0,pf) is the measurement uncertainty of v° hs , and £ ~ iV(0,(x 2 ) is a discrepancy 
function between the linear model and the observations. This model has two unknown param- 
eters, s and a, to be estimated on a calibration dataset consisting of N calculated harmonic 
frequencies {uji}f =1 , and their corresponding observed frequencies {v° bs } _ . 

In the Bayesian data analysis framework, 12,15 all information about parameters can be ob- 
tained from the joint posterior pdf p ys,a\ {y° bs , Pi, ■ In order to simplify the notations, 
we will omit in the following the list indices when they are not necessary. This pdf is obtained 
through Bayes theorem 

P(s,a\ {v obs ,p,uj}) oc p({u} \s,a, {p,uj}) p(s,a) (23) 



where p({z/ ofes } \s, a, {p, u}) is the likelihood function and p(s, a) is the prior pdf. 

As the difference between observation and corrected frequency has a normal distribution 

v? s - sui ~ N(0, a 2 + pi) (24) 



the likelihood function for a single observed frequency is 

pivf^p^) = ( 27 r(a 2 + p 2 ))" 1/2 exp f- ^^+^l ^ 

Considering that all frequencies are measured independently (with uncorrelated uncertainty) 
the joint likelihood is the product of the individual ones, i.e. 

,({✓*} = n(^(^ + ^))' 1/2 exp (26) 

i=l \ t=l ™ / 

As there is a priori no correlation between s and a, we use a factorized prior pdf p(s, a) = 
p(s)p(a). In the absence of a priori quantified information on s, a uniform pdf p(s) = cte is 
used. For a, we have to consider a positivity constraint, and we use a Jeffrey's prior, p(a) oc a -1 . 
The posterior pdf is finally defined up to a proportionality factor which is irrelevant for the 
following developments 

N / 1 N / obs _ \2\ 

p(^ik s ,^p}) oc a- 1 n ^ + ^ 2 )" l/2 ^ -\ e + 7 ( 27 ) 

i=i \ i=i * / 
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A. 2 Case of negligible measurement uncertainties 

For the analysis of vibrational frequencies, it is generally considered that experimental uncer- 
tainties are negligible (pi = 0). The general expression for the posterior pdf (Eq. [271) can be 
simplified accordingly to 




Using the identity 

{vf - su,) 2 = (s- + iV 7 2 (29) 

i=i 

it can be written in a convenient factorized form 

P (s,a\{v° b *,u;}) oc exp (-^g) exp {- { " ~ ) (30) 

from which we can derive analytical estimates for the parameters and their uncertainties. 
A. 2.1 Estimation of s 

The marginal density for s is obtained by integration over a 



p(s\{v° bs ,uj}) = / dap(s,a\{v° bs ,uj}) (31) 
Jo 

oc J™ da a^" 1 exp (-± g - S ^) 2 j (32) 



oc 



TV \ "*/ 2 



E« 6S -^) (33) 



,i=i 



which can be rewritten as 

which has the shape of a Student's distribution 16 



(34) 



2\ -(n+l)/2 

Stt(x) oc { 1 4~ "—— j (35) 
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Posing n = N — 1 and x 2 = (N — 1)/N (s — s) 2 ^2 ^f/l 2 -, we can directly use the properties of 
the Student's distribution 

E[z] = 0; Var[x] = n/{n - 2) (36) 

to derive 



8 = 8 (37) 

u °^*w J km (38) 

A. 2. 2 Estimation of a 

The marginal density for the standard uncertainty of the stochastic variable £ is 

/oo 

dsp(s,a\{u obs ,u}) (39) 
-oo 

K ^v ex P(-^J ( 41 ) 



<7 



Using the formula 



o 



2 V 2 

one obtains readily the following estimates 



dxx- n e- a ' x2 = Ir(^— h /a^- 1 )/ 2 (42) 



<r = 7 (43) 

^ = 7A /^ r ^- 2 )/ 2 ] (44) 

7 V 2 r[(n-l)/2] 1 J 

^ = -^^ 7 2 (45) 

iV-3 ' K J 



N N /rf(n-2)/2l\' 



A. 3 Case of very large measurement uncertainties 

When model discrepancy is negligible compared to measurement uncertainties, i.e. when the 
standard linear model is in a statistical regime, one recovers standard statistical analysis, the 
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Bayesian analog to weighted least squares. The posterior pdf for s is then 

N / T N ( obs _ V 

p(s\{v° b °,u,p}) oc Il^exp j ) ( 47 ) 



i=i \ i=i Hl 



from which one obtains 



« = E(^W/E( W M (48) 

(49) 



For uniform experimental uncertainty over the dataset, the scaling factor uncertainty varies 
linearly with p 



A. 4 Uncertainty propagation 

In the Bayesian framework, the posterior pdf p(s,cr\ {v obs , p, cj}) can be used to estimate the 
uncertainty of predicted frequencies 

p(u'\p', to', {v obs , p, u;}) = / dsdap(v'\s, a, p f , u>')p(s, a\ {v obs , p, co}) (51) 



where 

p(u'\s^p\J) oc (a 2 + p' 2 )- 1 / 2 exp L ^^ j (52) 

results from the stochastic model (Eq. [T2l) . This integral has generally to be evaluated numer- 
ically. 
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